Interprocessor Invocation on a NUMA . . .
نویسندگان
چکیده
On a distributed shared memory machine, the problem of minimizing accesses to remote memory modules is crucial for obtaining high performance. We describe an object-based, parallel programming system called OSMIUM to support experiments with mechanisms for performing invocations on remote objects. The mechanisms we have studied include: non-cached access to remote memory, data migration, and function-shipping using an interprocessor invocation protocol (IIP). Our analyses and experiments indicate that IIP competes well with the alternatives, especially when the structure of user programs requires synchronized access to data structures. While these results are obtained on a NUMA multiprocessor, they are also applicable to systems that use hardware cache coherency techniques. This work is supported in part by ONR/DARPA research contract no. N00014-82-K-0193 and in part by NSF research grant no. CDA-8822724.
منابع مشابه
Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable UniformMemory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to di erent processors, are p...
متن کاملExperiences with Data Distribution on NUMA Shared Memory Multiprocessors
The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...
متن کاملKernel-Kernel Communication in a Shared-Memory Multiprocessor t
In the standard kernel organization on a shared-memory multiprocessor all processors share the code and data of the operating system; explicit synchronization is used to control access to kernel data structures. Distributed-memory multicomputers use an alternative approach, in which each instance of the kernel performs local operations directly and uses remote invocation to perform remote opera...
متن کاملWilson-Dirac Operator Revisited on Multicore Vector Computers
We revisit the Wilson-Dirac operator, also refered as Dslash, on multicore vector machines. The Wilson-Dirac operator is the major computing kernel in Lattice Quantum ChromoDynamics (LQCD), which is the canonical discrete formalism for Quantum ChromoDynamics (QCD) investigations. QCD is the theory of sub-nuclear particles physics, aiming at modeling the strong nuclear force, which is responsibl...
متن کاملVMware ESX Server 2 NUMA Support
ESX Server 2 provides memory access optimization for both Intel processors and AMD Opteron processors in server architectures that support NUMA (nonuniform memory access). This white paper provides background on NUMA technologies and a detailed description of the sophisticated NUMA optimizations available in ESX Server 2. The document contains the following sections: • Introduction • What is NU...
متن کامل